User dictionary term criteria conditions

ABSTRACT

Techniques are disclosed for processing an abstract query which includes a dictionary term criteria condition. The dictionary term criteria condition is used to specify a set of one or more keywords, each of which should appear in a distinct document (of a defined set of documents) in order for the condition to be satisfied. In one embodiment, a user defines an abstract query by specifying a model entity (a logical focus for a query used to identify a set of documents associated with the model entity), logical fields (specifying query conditions and information to be returned), and a set of terms for a dictionary term criteria condition.

BACKGROUND OF THE INVENTION

1. Field of the Invention

Embodiments of the present invention generally relate to dataprocessing, and more particularly, to techniques for composing querieswith dictionary term criteria conditions.

2. Description of the Related Art

Databases are well known systems for storing, searching, and retrievinginformation stored in computer systems. A particularly common type ofdatabase is the relational database, which stores data using a set oftables that may be reorganized and accessed in a number of differentways. Users access information in relational databases using arelational database management system (DBMS).

Each table in a relational database includes a set of one or morecolumns. Each column typically specifies a name and a data type (e.g.,integer, float, string, etc.), and may be used to store a common elementof data. For example, in a table storing data about patients treated ata hospital, each patient might be referenced using a patientidentification number stored in a “patient ID” column. Reading acrossthe rows of such a table would provide data about a particular patient.Tables that share at least one attribute in common are said to be“related.” Further, tables without a common attribute may be relatedthrough other tables that do share common attributes. A path between twotables is often referred to as a “join,” and columns from tables relatedthrough a join may be combined to from a new table returned as a set ofquery results.

Queries of a relational database may specify which columns to retrievedata from, how to join the columns together, and conditions (predicates)that must be satisfied for a particular data item to be included in aquery result table. Current relational databases require that queries becomposed in complex query languages. Today, the most widely used querylanguage is Structured Query Language (SQL). However, other querylanguages are also used. An SQL query is composed from one or moreclauses set off by a keyword. Well-known SQL keywords include theSELECT, WHERE, FROM, HAVING, ORDER BY, and GROUP BY keywords. Composinga proper SQL query requires that a user understand both the structureand content of the relational database as well as the complex syntax ofthe SQL query language (or other query language). The complexity ofconstructing an SQL statement, however, generally makes it difficult foraverage users to compose queries of a relational database.

Because of this complexity, users often turn to database queryapplications to assist them in composing queries of a database. Onetechnique for managing the complexity of a relational database, and theSQL query language, is to use database abstraction techniques. Commonlyassigned U.S. Pat. No. 6,996,558 (the '558 patent) entitled “ApplicationPortability and Extensibility through Database Schema and QueryAbstraction,” discloses techniques for constructing a databaseabstraction model over an underlying physical database.

The '558 patent discloses embodiments of a database abstraction modelconstructed from logical fields that map to data stored in theunderlying physical database. Each logical field defines an accessmethod that specifies a location (i.e., a table and column) in theunderlying database from which to retrieve data. Users compose anabstract query by selecting logical fields and specifying conditions.The operators available for composing conditions in an abstract querygenerally include the same operators available in SQL (e.g., comparisonoperators such as =, >, <, >=, and, <=, and logical operators such asAND, OR, and NOT). Data is retrieved from the physical database bygenerating a resolved query (e.g., an SQL statement) from the abstractquery. Because the database abstraction model is tied to neither thesyntax nor the semantics of the physical database, additionalcapabilities may be provided by the database abstraction model withouthaving to modify the underlying database. Thus, the database abstractionmodel provides a platform for additional enhancements that allow usersto compose meaningful queries easily, without having to disturb existingdatabase installations.

However, even though the database abstraction model can simplify the useof an underlying database, it can nonetheless be very complex,particularly when it includes a large number of logical fields. Oneapproach to simplifying a database abstraction model is the use of modelentities, which provide an entity focus for abstract queries. Forexample, commonly assigned U.S. Pat. No. 7,054,877 (the '877 patent)entitled “Dealing with Composite Data through Data Model Entities”discloses the use of model entities to provide a focus for abstractqueries. The '877 patent discloses query interfaces configured to enablea user to compose abstract queries from logical fields of the databaseabstraction model, and to specify a model entity to provide a focus forthe composed abstract query.

SUMMARY OF THE INVENTION

One embodiment of the invention provides a computer-implemented methodfor processing an abstract query of an underlying physical database. Themethod may generally include providing a database abstraction model,wherein the database abstraction model provides (i) a plurality oflogical fields that each specify an access method defining a method foraccessing data associated with a respective logical field, and (ii) aplurality of model entities, wherein each model entity specifies a setof logical fields that map to data related to a respective model entityand specifies an identifier in the underlying database used to identifyinstances of the respective model entity. The method may also includereceiving an abstract query composed from one or more logical fields ofthe database abstraction model. At least a first logical field in theabstract query is specified in a dictionary term criteria condition, andthe dictionary term criteria condition includes a list of one or morekeywords. Further, the access method specified by the first logicalfield maps the first logical field to a plurality of documents relatedto a given instance of the model entity, and the dictionary termcriteria condition is evaluated by determining whether the plurality ofdocuments related to the given instance of the model entity includes atleast a distinct document containing a respective one of the one or morekeywords.

The method may also include generating, from the abstract query, aresolved query of the underlying physical database and storing theresolved query for execution against the underlying physical database.

Still another embodiment of the invention includes a computer-readablestorage medium containing a program which, when executed on a processor,performs an operation for processing an abstract query of an underlyingphysical database. The operation may generally include providing adatabase abstraction model, wherein the database abstraction modelprovides (i) a plurality of logical fields that each specify an accessmethod defining a method for accessing data associated with a respectivelogical field, and (ii) a plurality of model entities, wherein eachmodel entity specifies a set of logical fields that map to data relatedto a respective model entity and specifies an identifier in theunderlying database used to identify instances of the respective modelentity. The operation may also include receiving an abstract querycomposed from one or more logical fields of the database abstractionmodel. At least a first logical field in the abstract query is specifiedin a dictionary term criteria condition, and the dictionary termcriteria condition includes a list of one or more keywords. Further, theaccess method specified by the first logical field maps the firstlogical field to a plurality of documents related to a given instance ofthe model entity, and the dictionary term criteria condition isevaluated by determining whether the plurality of documents related tothe given instance of the model entity includes at least a distinctdocument containing a respective one of the one or more keywords.

The operation may also include generating, from the abstract query, aresolved query of the underlying physical database and storing theresolved query for execution against the underlying physical database.

Still another embodiment of the invention includes a system having aprocessor a memory storing an application, which, when executed by theprocessor is configured to perform an operation for processing anabstract query of an underlying physical database. The operation maygenerally include providing a database abstraction model, wherein thedatabase abstraction model provides (i) a plurality of logical fieldsthat each specify an access method defining a method for accessing dataassociated with a respective logical field, and (ii) a plurality ofmodel entities, wherein each model entity specifies a set of logicalfields that map to data related to a respective model entity andspecifies an identifier in the underlying database used to identifyinstances of the respective model entity. The operation may also includereceiving an abstract query composed from one or more logical fields ofthe database abstraction model. At least a first logical field in theabstract query is specified in a dictionary term criteria condition, andthe dictionary term criteria condition includes a list of one or morekeywords. Further, the access method specified by the first logicalfield maps the first logical field to a plurality of documents relatedto a given instance of the model entity, and the dictionary termcriteria condition is evaluated by determining whether the plurality ofdocuments related to the given instance of the model entity includes atleast a distinct document containing a respective one of the one or morekeywords.

The operation may also include generating, from the abstract query, aresolved query of the underlying physical database and storing theresolved query for execution against the underlying physical database.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the manner in which the above recited features, advantages andobjects of the present invention are attained and can be understood indetail, a more particular description of the invention, brieflysummarized above, may be had by reference to the embodiments thereofwhich are illustrated in the appended drawings.

It is to be noted, however, that the appended drawings illustrate onlytypical embodiments of this invention and are therefore not to beconsidered limiting of its scope, for the invention may admit to otherequally effective embodiments.

FIG. 1 illustrates a network environment using a client-serverconfiguration, according to one embodiment of the invention;

FIGS. 2A-2D illustrate a logical view of a database abstraction modelconstructed over an underlying physical database, according to oneembodiment of the invention;

FIG. 3 illustrates a relational view of software components forexecuting an abstract query, according to one embodiment of theinvention;

FIG. 4 is a flow diagram illustrating a method for composing adictionary term criteria condition, according to one embodiment of theinvention;

FIG. 5 is a flow diagram illustrating a method for composing an abstractquery, according to one embodiment of the invention;

FIG. 6 illustrates a graphical user interface of a query applicationconfigured for composing a dictionary term criteria condition, accordingto one embodiment of the invention;

FIGS. 7A-7B illustrate an example database query generated in responseto an abstract query which includes a dictionary term criteriacondition, according to one embodiment of the invention;

FIG. 8 illustrates a graphical user interface of a query applicationconfigured for displaying results generated by executing a query with adictionary term criteria condition, according to one embodiment of theinvention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Embodiments of the invention allow users to compose queries whichinclude a user dictionary term criteria condition. The dictionary termcriteria condition is used to specify a set of one or more keywords,each of which should appear in a distinct document in order for thecondition to be satisfied. In one embodiment, a user defines an abstractquery by specifying a model entity (a logical focus for a query),logical fields (specifying query conditions and information to bereturned), and a set of terms for a dictionary term criteria condition.The model entity defines a focus of the query, e.g., a query to returninformation about “patients” in a database of medical records. In such acase, the dictionary term criteria condition could be evaluated using acollection of patient visit notes. For a given patient, if at least onedistinct note includes each of the terms specified for the dictionaryterm criteria condition, then information requested about that patient(as specified by the query) is included in the query results.

For example, a user may want to identify patients that have beendiagnosed with ‘disease 1, disease 2 . . . disease N,’ where theinformation about diagnosed diseases is contained in multiple sourcesidentified as patient visit notes. Such a request is an example of adictionary term criteria condition. Patients often make multiple doctorvisits and different diseases may be diagnosed during different times,by different doctors, and thus, are located in different documents. Toreceive an answer to the request, the user builds an abstract querywhich identifies a model entity—‘patient’—and includes theterms—‘diagnosis 1, diagnosis 2 . . . diagnosis N.’ According to thetechniques described herein, this abstract query is transformed andexecuted, generating a list of patients that have been diagnosed at somepoint of time with each of the diseases 1-N based on one or moredocuments, where each of the one or more documents contains informationabout a patient being diagnosed with more than one disease on the list.

In the following, reference is made to embodiments of the invention.However, it should be understood that the invention is not limited tospecific described embodiments. Instead, any combination of thefollowing features and elements, whether related to differentembodiments or not, is contemplated to implement and practice theinvention. Furthermore, in various embodiments the invention providesnumerous advantages over the prior art. However, although embodiments ofthe invention may achieve advantages over other possible solutionsand/or over the prior art, whether or not a particular advantage isachieved by a given embodiment is not limiting of the invention. Thus,the following aspects, features, embodiments and advantages are merelyillustrative and are not considered elements or limitations of theappended claims except where explicitly recited in a claim(s). Likewise,reference to “the invention” shall not be construed as a generalizationof any inventive subject matter disclosed herein and shall not beconsidered to be an element or limitation of the appended claims exceptwhere explicitly recited in a claim(s).

One embodiment of the invention is implemented as a program product foruse with a computing system. The program(s) of the program productdefine functions of the embodiments (including the methods describedherein) and can be contained on a variety of computer-readable storagemedia. Illustrative computer-readable storage media include, but are notlimited to: (i) non-writable storage media (e.g., read-only memorydevices within a computer such as CD-ROM disks readable by a CD-ROMdrive) on which information is permanently stored; (ii) writable storagemedia (e.g., floppy disks within a diskette drive or hard-disk drive) onwhich alterable information is stored. Such computer-readable storagemedia, when carrying computer-readable instructions that direct thefunctions of the present invention, are embodiments of the presentinvention. Other media include communications media through whichinformation is conveyed to a computer, such as through a computer ortelephone network, including wireless communications networks. Thelatter embodiment specifically includes transmitting information to/fromthe Internet and other networks. Such communications media, whencarrying computer-readable instructions that direct the functions of thepresent invention, are embodiments of the present invention. Broadly,computer-readable storage media and communications media may be referredto herein as computer-readable media.

In general, the routines executed to implement the embodiments of theinvention may be part of an operating system or a specific application,component, program, module, object, or sequence of instructions. Thecomputer program of the present invention typically is comprised of amultitude of instructions that will be translated by the native computerinto a machine-readable format and hence executable instructions. Also,programs are comprised of variables and data structures that eitherreside locally to the program or are found in memory or on storagedevices. In addition, various programs described hereinafter may beidentified based upon the application for which they are implemented ina specific embodiment of the invention. However, it should beappreciated that any particular program nomenclature that follows isused merely for convenience, and thus the invention should not belimited to use solely in any specific application identified and/orimplied by such nomenclature.

FIG. 1 illustrates a network environment 100 using a client-serverconfiguration, according to one embodiment of the invention. Clientcomputer systems 105 _(1-N) include an interface that enables networkcommunications with other systems over network 104. The network 104 maybe a local area network where both the client system 105 and serversystem 110 reside in the same general location, or may be networkconnections between geographically distributed systems, includingnetwork connections over the internet. Client system 105 generallyincludes a central processing unit (CPU) connected by a bus to memoryand storage (not shown). Each client system 105 is typically running anoperating system configured to manage interaction between the computerhardware and the higher-level software applications running on theclient system 105 (e.g., a Linux® distribution, a version of theMicrosoft Windows® operating system IBM's AIX® or OS/400®, FreeBSD, andthe like). (“Linux” is a registered trademark of Linus Torvalds in theUnited States and other countries.)

The server system 110 may include hardware components similar to thoseused by the client system 105. Accordingly, the server system 110generally includes a CPU, a memory, and a storage device, coupled by abus (not shown). The server system 110 is also running an operatingsystem, (e.g., a Linux® distribution, Microsoft Windows®, IBM's OS/400®or AIX®, FreeBSD, and the like).

The network environment 100 illustrated in FIG. 1, however, is merely anexample of one computing environment. Embodiments of the presentinvention may be implemented using other environments, regardless ofwhether the computer systems are complex multi-user computing systems,such as a cluster of individual computers connected by a high-speednetwork, single-user workstations, or network appliances lackingnon-volatile storage. Further, the software applications illustrated inFIG. 1 and described herein may be implemented using computer softwareapplications executing on existing computer systems, e.g., desktopcomputers, server computers, laptop computers, tablet computers, and thelike. However, the software applications described herein are notlimited to any currently existing computing environment or programminglanguage, and may be adapted to take advantage of new computing systemsas they become available.

In one embodiment, users interact with the server system 110 using agraphical user interface (GUI) provided by a user interface 115. In aparticular embodiment, GUI content may comprise HTML documents (i.e.,web-pages) rendered on a client computer system 105 using web-browser122. In such an embodiment, the server system 110 includes a HypertextTransfer Protocol (HTTP) server 118 (e.g., a web server such as the opensource Apache web-server program or IBM's Web Sphere® program)configured to respond to HTTP requests from the client system 105 and totransmit HTML documents to client system 105. The web-pages themselvesmay be static documents stored on server system 110 or generateddynamically using application server 112 interacting with web-server 118to service HTTP requests. Alternatively, client application 120 maycomprise a database front-end, or query application program running onclient system 105. The web-browser 122 and application 120 may beconfigured to allow a user to compose an abstract query, and to submitthe query to the runtime component 114 for processing.

As illustrated in FIG. 1, server system 110 may further include runtimecomponent 114, database management system (DBMS) 116, and databaseabstraction model 148. In one embodiment, these components may beprovided using software applications executing on the server system 110.The DBMS 116 includes a software application configured to managedatabases 214 ₁₋₃. That is, the DBMS 116 communicates with theunderlying physical database system, and manages the physical databaseenvironment behind the database abstraction model 148. Users interactwith the user interface 115 to compose and submit an abstract query tothe runtime component 114 for processing.

In one embodiment, the runtime component 114 may be configured toreceive an abstract query, and in response, to generate a “resolved” or“concrete” query that corresponds to the schema of underlying physicaldatabases 214. For example, the runtime component 114 may be configuredto generate one or more Structured Query Language (SQL) statements froman abstract query. The resolved queries generated by the runtimecomponent 114 are supplied to DBMS 116 for execution. Additionally, theruntime component 114 may be configured to modify the resolved querywith additional restrictions or conditions, based on the focus of theabstract query, i.e., based on the model entity specified for a givenquery.

FIG. 2A illustrates a plurality of interrelated components of theinvention, along with relationships between the logical view of dataprovided by the database abstraction model environment (the left side ofFIG. 2A), and the underlying physical database environment used to storethe data (the right side of FIG. 2A).

In one embodiment, the database abstraction model 148 providesdefinitions for a set of logical fields 208, model entities 225, andrelevant fields 229. Users compose an abstract query 202 by specifyinglogical fields 208 to include in selection criteria 203 and resultscriteria 204. An abstract query 202 may also identify a model entity 201from the set of model entities 225. The resulting query is generallyreferred to herein as an “abstract query” because it is composed usinglogical fields 208 rather than direct references to data structures inthe underlying physical databases 214. The model entity 225 may be usedto indicate the focus of the abstract query 202 (e.g., a “patient”, a“person”, an “employee”, a “test”, a “facility,” etc). For example,abstract query 202 includes an indication that the query is directed toinstances of the “patient” model entity 201, and further includesselection criteria 203 indicating that patients with a“hemoglobin_test>20” should be retrieved. The selection criteria 203 arecomposed by specifying a condition evaluated against the data valuescorresponding to a logical field 208 (in this case the “hemoglobin_test”logical field. The operators in a condition typically include comparisonoperators such as =, >, <, >=, or, <=, and logical operators such asAND, OR, and NOT. Results criteria 204 indicates that data retrieved forthis abstract query 202 includes data for the “name,” “age,” and“hemoglobin_test” logical fields 208.

As stated, in one embodiment, an abstract query may specify a type ofmodel entity being queried (e.g., a patient, an employee or a test).That is, a model entity defines the focus, or central concept, for anabstract query. Rather than compose a query data based on the structureof an underlying database (e.g., an SQL schema), users compose a queryabout a model entity (e.g., about a patient) by specifying which logicalfields should be used to evaluate whether a given instance of the modelentity should be included in the query results. Doing so allows users tocompose complex queries in a straightforward and intuitive manner.However, as stated, not all logical fields 208 are typically going to berelated to each of the model entities 225. Thus, an interface thatpresents a complete collection of logical fields 208 to a user maybecome cluttered and confusing, at best, or worse, may allow users tocompose an abstract that cannot be resolved into a correspondingphysical query. The use of model entities to provide a focus forabstract queries is described in commonly assigned U.S. Pat. No.7,054,877 (the '877 patent) entitled “Dealing with Composite Datathrough Data Model Entities.”

In one embodiment, relevant fields 229 include each logical field 208 ofdatabase abstraction model 148 that is relevant to a given model entity225. As used herein, “relevant fields” are logical fields 208 that storedata related to a given model entity 225 and are available to include inan abstract query 202 directed to that model entity 225. The particularlogical fields 208 which are available may include the complete set of“relevant fields” but may also include a subset of those logical fields.As stated, for example, logical fields associated with a model entitymay be marked as unavailable in a given case due to database resourcesthat are unavailable to the user composing an abstract query based on auser profile. That is, the user may be authorized to compose a queryregarding a given model entity, but may not be authorized to accesseverything about that entity available through the relevant logicalfields. Similarly, if database resources are unavailable or underdevelopment, logical fields that reference such resources may be markedas unavailable. As another example, logical fields may be madeunavailable when their use in an abstract given query would exceed thesystem resources from a complexity or performance perspective (orresources allocated to a given user).

Thus, in one embodiment, a user of query interface 11 5 composing anabstract query 202 for a specific model entity 225 may be presented withonly the relevant fields 229 that correspond to that model entity 225.

In one embodiment, relevant fields 229 may be generated at a start-uptime for a computer system (e.g., server system 110). Alternatively,relevant fields 229 may be generated periodically (e.g., daily, weekly,monthly, etc.) or whenever a change is made to a related part ofdatabase abstraction model 148, for example adding/modifying a logicalfield, adding/modifying a model entity, adding/modifying a relationship,etc.

In another embodiment, relevant fields 229 may be generated when a givenuser logs in or when the user composes an abstract query directed to aparticular model entity. In such a case, the relevant logical fields maybe evaluated dynamically based on a user profile, and only the logicalfield associated with the model entity (and not marked unavailable) arepresented in to the user (e.g., as part of query interface 115).Relevant fields 229 may be stored by any suitable technique, for examplein a database table, in an XML data file, and the like.

In one embodiment, runtime component 114 retrieves data from physicaldatabase 214 by generating a resolved query (e.g., an SQL statement)from abstract query 202. Because database abstraction model 148 is nottied to either the schema of physical database 214 or the syntax of aparticular query language, additional capabilities may be provided bydatabase abstraction model 148 without having to modify the underlyingdatabase. Further, depending on the access method specified for alogical field, runtime component 114 may transform abstract query 202into an XML query that queries data from database 2141, an SQL query ofrelational database 2142, or other query composed according to anotherphysical storage mechanism using other data representation 214 ₃, orcombinations thereof (whether currently known or later developed).

FIGS. 2B-2D illustrate an exemplary abstract query 202, relative to thedatabase abstraction model 148, according to one embodiment of theinvention. As shown in FIG. 2B, abstract query 202 includes selectioncriteria 203 indicating that the query should retrieve instances of thepatient model entity 201 with a “hemoglobin” test value greater than“20.” The particular information retrieved using abstract query 202 isspecified by result criteria 204. In this example, the abstract query202 retrieves a patient's name and a test result value for a hemoglobintest. The actual data retrieved may include data from multiple tests.That is, the query results may exhibit a one-to-many relationshipbetween a particular model entity and the query results.

An illustrative abstract query corresponding to abstract query 202 isshown in Table I below. In this example, the abstract query 202 isrepresented using eXtensible Markup Language (XML). In one embodiment,query interface 115 may be configured to enable a user to compose anabstract query, and to generate an XML document to represent thefinished abstract query. Those skilled in the art will recognize thatXML is a well known markup language used to facilitate the sharing ofstructured text and information, other markup languages, however, may beused.

TABLE I Query Example 001 <?xml version=“1.0”?> 002 <!--Query stringrepresentation: (“Hemoglobin_test > 20”) 003 <QueryAbstraction> 004 <Selection> 005   <Condition> 006    <Condition field=“Hemoglobin Test”operator=“GT”       value=“20” 007   </Condition> 008  </Selection> 009 <Results> 010     <Field name=“FirstName”/> 011     <Fieldname=“LastName”/> 012     <Field name=“hemoglobin_test”/> 013 </Results> 014  <Entity name=“Patient” > 015      <FieldRefname=“data://patient/PID” /> 016      <Usage type=“query” /> 017    </EntityField> 018  </Entity> 019 </QueryAbstraction>The XML markup shown in Table I includes the selection criteria 203(lines 004-008) and the results criteria 204 (lines 009-013). Selectioncriteria 203 includes a field name (for a logical field), a comparisonoperator (=, >, <, etc) and a value expression (what the field is beingcompared to). In one embodiment, the results criteria 204 include a setof logical fields for which data should be returned. The actual datareturned is consistent with the selection criteria 203. Lines 14-18identify the model entity selected by a user, in this example, a“Patient” model entity. Thus, the query results returned for abstractquery 202 are instances of the “Patient” model entity. Line 15 indicatesthe identifier in the physical database 214 used to identify instancesof the model entity. In this case, instances of the “Patient” modelentity are identified using values from the “Patient ID” column of apatient table.

After composing an abstract query, a user may submit it to runtimecomponent 114 for processing. In one embodiment, runtime component 114may be configured to process abstract query 202 by generating anintermediate representation of abstract query 202, such as an abstractquery plan. In one embodiment, an abstract query plan is composed from acombination of abstract elements from the data abstraction model andphysical elements relating to the underlying physical database. Forexample, an abstract query plan may identify which relational tables andcolumns are referenced by which logical fields included in abstractquery 202, and further identify how to join columns of data together.Runtime component 114 may then parse the intermediate representation inorder to generate a physical query of the underlying physical database(e.g., an SQL statement(s)). Abstract query plans and query processingare further described in a commonly assigned U.S. Pat. No. 7,461,052(the '052 patent) entitled “Abstract Query Plan.”

FIG. 2B further illustrates an embodiment of a database abstractionmodel 148 that includes a plurality of logical field specifications 208₁₋₅ (five shown by way of example). The access methods included inlogical field specifications 208 (or logical field, for short) are usedto map the logical fields 208 to tables and columns in an underlyingrelational database (e.g., database 214 ₂ shown in FIG. 2A). Asillustrated, each field specification 208 identifies a logical fieldname 210 ₁₋₅ and an associated access method 2121 ₁₋₅. Depending uponthe different types of logical fields, any number of access methods maybe supported by database abstraction model 148. FIG. 2B illustratesaccess methods for simple fields, filtered fields, and composed fields.Each of these three access methods are described below.

A simple access method specifies a direct mapping to a particular entityin the underlying physical database. Field specifications 208 ₁, 208 ₂,and 208 ₅ each provide a simple access method, 212 ₁, 212 ₂, and 212 ₅,respectively. For a relational database, the simple access method maps alogical field to a specific database table and column. For example, thesimple field access method 212 ₁ shown in FIG. 2B maps the logical fieldname 210 ₁ “FirstName” to a column named “f_name” in a table named“Demographics.”

Logical field specification 208 ₃ exemplifies a filtered field accessmethod 212 ₃. Filtered access methods identify an associated physicaldatabase and provide rules defining a particular subset of items withinthe underlying database that should be returned for the filtered field.Consider, for example, a relational table storing test results for aplurality of different medical tests. Logical fields corresponding toeach different test may be defined, and a filter for each different testis used to associate a specific test with a logical field. For example,logical field 208 ₃ illustrates a hypothetical “Hemoglobin Test.” Theaccess method for this filtered field 212 ₃ maps to the “Test_Result”column of a “Tests” tests table and defines a filter “Test_ID=‘1243.’”Only data that satisfies the filter is returned for this logical field.Accordingly, the filtered field 208 ₃ returns a subset of data from alarger set, without the user having to know the specifics of how thedata is represented in the underlying physical database, or having tospecify the selection criteria as part of the query building process.

Field specification 208 ₄ exemplifies a composed access method 212 ₄.Composed access methods generate a return value by retrieving data fromthe underlying physical database and performing operations on the data.In this way, information that does not directly exist in the underlyingdata representation may be computed and provided to a requesting entity.For example, logical field access method 212 ₄ illustrates a composedaccess method that maps the logical field “age” 208 ₄ to another logicalfield 208 ₅ named “birthdate.” In turn, the logical field “birthdate”208 ₅ maps to a column in a demographics table of relational database214 ₂. In this example, data for the “age” logical field 208 ₄ iscomputed by retrieving data from the underlying database using the“birthdate” logical field 208 ₅, and subtracting a current date valuefrom the birth date value to calculate an age value returned for thelogical field 208 ₄. Another example includes a “name” logical filed(not shown) composed from the first name and last name logical fields208 ₁ and 208 ₂.

By way of example, the field specifications 208 shown in FIG. 2B arerepresentative of logical fields mapped to data represented in therelational data representation 214 ₂. However, other instances ofdatabase abstraction model 148, or other logical field specifications,may map to other physical data representations (e.g., databases 214 ₁ or214 ₃ illustrated in FIG. 2A). Further, in one embodiment, databaseabstraction model 148 is stored on computer system 110 using an XMLdocument that describes the model entities, logical fields, accessmethods, and additional metadata that, collectively, define the databaseabstraction model for a particular physical database system. Otherstorage mechanisms or markup languages, however, are also contemplated.

Referring to FIG. 2C, database abstraction model 148 also includes modelentities 225. Illustratively, only a single model entity 225 is shown,for the model entity “Patient.” As shown, model entity 225 includes aset of relationships 226 which identify data available in database 214that is related to instances of the “Patient” model entity. For example,the first model entity relationship 226 indicates that data from a“Demographics” table and a “Lineage” table are linked by columns named“Patient ID.” Further, the second model entity relationship 226indicates that data from the “Demographics” table and a “Tests” tableare linked by columns named “Patient ID.” Collectively, relationships226 define the “universe” of data about the model entity 225 stored inthe underlying physical database 214. That is, relationships 226 specifywhat physical tables and fields are accessible for a given model entity225.

Referring to FIG. 2D, database abstraction model 148 also includesrelevant fields 229. As shown, relevant fields 229 correspond to the“Patient” model entity 225, and include the logical fields “FirstName,”“LastName,” “Birthdate” and “Age.” As described above, relevant fields229 may identify the logical fields 208 of database abstraction model148 that are relevant to a given model entity 225. In one embodiment,relevant fields 229 may be generated from relationships 226 and logicalfields 208.

FIG. 3 illustrates a relational view 300 of software components forexecuting an abstract query, according to one embodiment of theinvention. The software components of relational view 300 include a userinterface 115, an application 310, the runtime component 114, databasemanagement system (DBMS) 116, database 214, and database abstractionmodel 148.

As shown, the application 310 includes an abstract query 202.Illustratively, the abstract query 202 is created in the user interface115, such as a graphical user interface (GUI). However, note that theuser interface 115 is only shown by the way of example; any suitablerequesting entity may create abstract query 202 (e.g., the application310, an operating system, or an end user).

In one embodiment, the abstract query 202 is translated by the runtimecomponent 114 into a resolved query 302. This translation is performedwith the use of the database abstraction model 148, as described abovewith reference to FIGS. 2A-2D. The resolved query 302 is submitted tothe DBMS 116 for execution against the database 214, thus producing aset of query results 312. The query results 312 may be presented to auser (i.e., in user interface 115), or may be used for furtherprocessing (e.g., as inputs for rule processing, etc.).

FIG. 4 is a flow diagram illustrating method 400 for processing a querywith a dictionary term criteria condition, according to one embodimentof the invention. As described above, such a dictionary term criteriacondition is used to request results where multiple terms requested,each term present in a distinct document (or other text source).

As shown, method 400 starts begins at step 405 where a user composes anabstract query and submits it for execution. The abstract query mayinclude any number of query conditions and result specificationscomposed using the logical fields of a data abstraction model. Further,the query may include at least one dictionary term criteriacondition—specifying a set of terms and a logical field identifying whatdocuments in which to search for the terms. The query may also specify amodel entity (a focus) from a set of model entities defined by the dataabstraction model. In one, the abstract query is composed by a userinteracting with a user interface. Alternatively, such information maybe provided from execution of another query, by a software component,and so on.

At step 410, the abstract query condition is transformed into a resolvedquery, i.e., a query suitable for executing on a set of physicaldatabase systems underlying a data abstraction model. The resolved querymay be executed to return instances of the model entity (and datarequested each instance). For example, if the terms are ‘Disease 1,Disease 2 . . . Disease N,’ the model entity is ‘Patient’, and thelogical field for the dictionary term criteria condition identifies acolumn of ‘Patient Visit Notes,’ then an instances of the “patient”model entity (identified, e.g., for example, by a patient ID) satisfiesthe terms when in one or more of the documents contained in the ‘PatientVisit Notes’ for the particular patient, has at least one record foreach of the terms. That is, a particular patient is identified in queryresults when the patient has a “Patient Visit Note” that includes theterm “Disease 1,” and separate “Patient Visit Notes” that include“Disease 2” up through “Disease N.” A more detailed example of thetransforming step 410 is described below with respect to FIG. 5.

Method 400 terminates with step 415 where the resulting resolved queryis stored, for example in the memory of a computer. In one embodiment,the resolved query is stored only for a short period of time, forexample, until it can be submitted for execution to an underlyingdatabase system. In other embodiment, the resolved query (or componentsthereof) is stored for a longer period of time, for example, for use incomposing other queries.

FIG. 5 is a flow diagram illustrating method 500 for processing anabstract query, according to one embodiment of the invention. As shown,the method 500 begins at step 505. At step 505, one of the terms of anabstract dictionary term criteria condition is selected. At step 510, alook-up against the selected term is created that references theunderlying data source identified by a logical field of the dictionaryterm criteria condition. For example, assume the dictionary termcriteria condition specifies a set of terms for a logical fieldreferencing “patient notes” and that this logical field points to acolumn in a database table where each entry in the column stores apatient note for a patient (identified by a Patient ID value in anothercolumn of the database table). In such a case, a lookup is generatedagainst this database table for one of the terms in the dictionary termcriteria condition. At step 515, if more dictionary terms were includedin the query (i.e., keywords) the method returns to step 505, whereanother term is selected and another look-up for that term is generatedfor a resolved query. Importantly, therefore, a separate look-up iscreated for each term identified in the dictionary term criteriacondition. In other words, in one embodiment, the series of steps505-515 is repeated until such the look-ups are created for each of theterms identified by the dictionary term criteria condition.

For example, FIG. 7A illustrates a resolved query 642 generated. Inparticular, the terms identified by a dictionary term criteria conditioninclude “taxes,” “hiccupping,” and “satisfied.” In this example, thesub-queries 715 represent the result of repeating steps 505-515.Importantly, each lookup represents a separate instance of a table usedto search for one of the keyword terms.

Returning to FIG. 5, at step 520, a portion of the resolved query isgenerated to evaluate output of the results of the created look-ups(i.e., of sub-queries 715) More specifically, sets of results areselected, where each set includes results from one of the createdlook-ups. The sets of results may further all be associated with aparticular instance of the model entity identified by the abstractquery. For example, if the model entity is a ‘patient,’ an instance ofthe model entity is a particular patient (e.g., identified by ‘PatientID)’ and the corresponding set includes the documents containingreference to that particular patient.

In one embodiment, the output of a look-up for a particular term isrepresented by a numeric value where the value greater than zeroindicates that the term is in the documents (e.g., in at least one thepatient notes) and zero indicates the term is not found within thedocuments. The result of step 520 may be seen in FIG. 7A, identified asitem 705. Note, that the result may effectively represent a matrix ofall documents and all the terms found.

At step 525, a portion of the resolved query is generated to group theresults based on the model entity. (e.g., to order query results basedon patient ID values. Further, as shown in FIG. 7A at 720, a conditionmay be added to only return all instances of the model entity (e.g., allpatients) that have all the terms (e.g., ‘taxes,’ ‘hiccupping,’ and‘satisfied’) in the sources associated with the logical field.

The above described process of steps 505 through 525 illustrate anapproach for creating of a resolved query for a dictionary term criteriacondition. However, it might be desirable to have additional conditionsin a final query, including other multiple-occurrence conditions. Inother words, the multiple-occurrence condition might need to be includedwithin another set of conditions, such as, for example,‘othercondition1,’ ‘othercondition2,’ and ‘othercondition3’ shown inFIG. 7A. For example, a simple condition of “Patient age>35” could bespecified in an abstract query. In such a case, a resolved query may begenerated to identify patients older than 35, and the results of such aquery could be intersected with the results of the resolved query shownin FIG. 7A. Accordingly, at step 530 and 535, the resultingmultiple-occurrence condition may be linked to such additionalconditions to build the final query. In FIG. 7A, such a linking is showsby in the portion of the “WHERE” clause of this example query thatincludes “other condition 1 AND other condition 2.”

Though not necessary, in one embodiment, at step 540, the query isdisplayed to the user via, for example, a graphical user interface. Theuser may choose to run the query, or modify the query by modifyingrespective abstract query.

FIG. 6 illustrates a window 600 of a graphical user interface (GUI) of aquery application configured for composing a dictionary term criteriacondition queries, according to one embodiment of the invention.Illustratively, the window 600 includes four sections: a ‘Terms’ section610 for selecting keywords; a ‘Model Entity’ section 620 for identifyinga model entity; a ‘Logical Field’ section 630 for identifying where theterms associated with instances of the model entity should be searched;and a ‘Multiple Occurrence’ Section 640 for displaying the resultingmultiple-occurrence condition to the user. Window 600 also includestitle 608, horizontal scrolling controls 602, vertical scrollingcontrols 604, and window closing control 606.

In one embodiment, ‘Terms’ section 610 shows terms already selected by auser, e.g., terms “taxes,” “hiccupping,” and “satisfied,” and includesadditional control 618 labeled “select additional terms,” which the userselects to define additional terms of the dictionary term criteriacondition. As discussed above, terms define limitations or criteria fordata records that are returned for instances of the model entity. Uponselection of control 618, a pop-up window may be displayed to the user.the user may enter additional terms in the pop-up window, oralternatively, select the additional term among the available terms.Upon a new additional term being entered, such a term is displayed tothe user in ‘Terms’ section 610 together with the previously selectedterms.

Illustratively, ‘Model entity’ section 620 shows a model entity selectedby a user, e.g., “patient.” As discussed above, the model entity definesa focus of an abstract query. Accordingly, typically only one modelentity would be selected by a user. However, in one embodiment, the useris provided with an option of selecting more than one model entity, viacontrol 624, labeled “select additional model entity.” Similar to‘Terms’ section, upon the user selecting control 624, a pop-up window(not shown) is displayed, where the user may enter an additional modelentity, or alternatively, select the additional model entity. Upon thenew model entity term being selected, the model entity is displayed tothe user in the ‘Model Entity’ section 620 together with the previouslyselected model entities.

In one embodiment, ‘Logical Field’ section 630 shows logical fieldsselected by a user, e.g., “patient visit notes.” In the dictionary termcriteria condition, logical fields generally define data to which theresulting multiple-occurrence condition should be applied. In otherwords, the logical field specifies data sources, such as one or moredocuments for a given column, e.g., ‘Patient Visit Notes’ which shouldbe evaluated to process the dictionary term criteria condition. Asdescribed above, a logical field provides representation of a specificset of data in an underlying database. According to one embodiment, thelogical fields are defined independently of the underlying physicalrepresentation being used in the database. When more than one logicalfield is selected such logical fields are linked. To add logical fieldsto the dictionary term criteria condition, in one embodiment, a userselects control 634, labeled select additional logical field. Upon theselection, a pop-up window may be displayed to the user, where the userselects or enters additional logical fields, which are subsequentlydisplayed to the user in ‘Logical Field’ section 630.

‘Multiple-occurrence condition’ section 640 displays a resulting query642 having a dictionary term criteria condition composed based on theAMO condition created in sections 610, 620, and 630. FIGS. 7A and 7Billustrate examples of such multiple occurrence queries. The user has anoption of canceling the selections made and returning to building theabstract query by selecting control 644 labeled “CANCEL.” The user alsohas an option of incorporating the multiple-occurrence condition intothe final query by selecting control 646 labeled “Build Query.” Uponselecting control 646, in one embodiment, the user is provided with anopportunity to add more conditions to the final query, including otherdictionary term criteria conditions.

Note that window 600 is merely an example of the graphical userinterface accordingly to one embodiment and other representations andimplementations are possible. For example, not every user needs to seethe resulting multiple-occurrence condition query 642, and thus, in oneembodiment query 642 is not displayed. Alternatively, themultiple-occurrence condition query, or only the condition, may bedisplayed upon a specific request from the user. Further, in oneembodiment, the user is required to define a logical field and modelentity before defining the terms, for example, to limit the selection ofavailable terms.

As discussed above, FIGS. 7A and 7B illustrate different versions of amultiple-occurrence condition query 642, according to two differentembodiments of the invention. In FIG. 7A, lines representing themultiple-occurrence condition are shown between lines 705, while“othercondition1, othercondition2, and othercondition3” represent othernon-multiple-occurrence conditions of the query.

Query 642 of FIG. 7B is an optimized version of query 642 of FIG. 7A. Inparticular, to generate the query of FIG. 7B, term searches for a givenpatient are performed only when previous terms for that patient havebeen already found. Checking for terms is pushed to individualsub-queries and the existence of a hit is required to be found directlyin the joined table. In this manner, the need for the aggregation/havingclause of the query of FIG. 7A is removed. Consequently, the optimizedquery provides for improved SQL performance and conserves resources,such as memory. A person skilled in the art would appreciate that othertypes of optimizations are also possible.

FIG. 8 illustrates window 800 of a graphical user interface of a queryapplication configured for displaying results generated by executing adictionary term criteria condition, according to one embodiment of theinvention. As shown, window 800 includes a title 808, horizontalscrolling controls 802, vertical scrolling controls 804, and windowclosing control 806. The main section of window 800 lists resultsgenerated by executing the a resolved query generated from an abstractquery which included the dictionary term criteria condition. In thepresent example, results 714 included in the result list 710 are patientIDs 714 ₁-714 _(E) of patients that have one or more records containingthe terms, where at least one distinct record contains each term.

In one embodiment, the user is provided with an opportunity to see alldocuments containing at least one of the terms. For example, as shown,if a user clicks on a particular patient ID 714, a pop-up windowcontaining table 720 is displayed to the user. The table 720 contains acolumn 722 listing all such documents and a column for each of theterms, e.g., column 724 for ‘taxes,’ column 726 for ‘hiccupping,’ andcolumn 728 for ‘satisfied.’ Each of the columns contains indications ofwhether the respective term is present in each of the documents ofcolumn 722. For example, “Y” indicates that a record containing the termmay be found for the patient in the given document, and “N” indicatesotherwise.

Note however, that in one embodiment, because a single document maycontain multiple terms and multiple documents may contain the sameterms, only a representative sample of the documents is displayed to theuser. In another embodiment, instead of displaying tables 720 only alist of relevant documents is provided. In yet another embodiment, nopop-up windows are employed. Rather all the results are directlydisplayed in window 800. In another embodiment, the user is providedonly with the list of instances of the model entity, e.g. patients.

Advantageously, as described herein, embodiments of the invention enablecomposing a multiple-occurrence condition query, based on an abstractquery defined by a user. In particular, the multiple-occurrencecondition of the query provides for determining instances of theabstract query's model entity in the sources identified by the abstractquery that satisfy the terms defined by the abstract query.Advantageously, the above described techniques allow finding an instancesatisfying all the terms even when records associated with the instanceand containing two different terms are present only in a single and thesame source.

While the foregoing is directed to embodiments of the present invention,other and further embodiments of the invention may be devised withoutdeparting from the basic scope thereof, and the scope thereof isdetermined by the claims that follow.

1. A computer-implemented method for processing an abstract query of anunderlying physical database, comprising: providing a databaseabstraction model, wherein the database abstraction model provides (i) aplurality of logical fields that each specify an access method defininga method for accessing data associated with a respective logical field,and (ii) a plurality of model entities, wherein each model entityspecifies a set of logical fields that map to data related to arespective model entity and specifies an identifier in the underlyingdatabase used to identify instances of the respective model entity;receiving an abstract query composed from one or more logical fields ofthe database abstraction model, wherein at least a first logical fieldin the abstract query is specified in a dictionary term criteriacondition, wherein the dictionary term criteria condition includes alist of one or more keywords, wherein the access method specified by thefirst logical field maps the first logical field to a plurality ofdocuments related to a given instance of the model entity, and whereinthe dictionary term criteria condition is evaluated by determiningwhether the plurality of documents related to the given instance of themodel entity includes at least a distinct document containing arespective one of the one or more keywords; generating, from theabstract query, a resolved query of the underlying physical database;and storing the resolved query for execution against the underlyingphysical database.
 2. The method of claim 1, further comprising:executing the resolved query to identify one or more instances of themodel entity that satisfy the dictionary term criteria condition; andreturning, to a requesting entity, an indication of the identifiedinstances of the model entity.
 3. The method of claim 2, furthercomprising, presenting results of the executed query to the user in agraphical user interface on a display device.
 4. The method of claim 1,wherein each of the plurality of documents are stored as unstructuredtext in a column of a table in the underlying physical database, andwherein generating, from the abstract query, a resolved query of theunderlying physical database comprises: generating, for each of theplurality of terms, an instance of the table; generating, for eachinstance of the table, a sub query configured to identify instances ofthe model entity for which the unstructured text contains a respectiveone of the keywords; generating an intersecting query configured tointersect the results of each of the sub queries to identify instancesof the model entity which include at least one distinct document foreach keyword.
 5. The method of claim 1, wherein the abstract queryfurther specifies at least one condition composed from one or morelogical fields of the data abstraction model.
 6. The method of claim 1,wherein the abstract query further specifies at least one result fieldspecifying data stored by the underlying database to return forinstances of the model entity that satisfy the dictionary term criteriacondition.
 7. The method of claim 1, wherein the resolved query iscomposed using the SQL query language.
 8. A computer-readable storagemedium containing a program which, when executed on a processor,performs an operation for processing an abstract query of an underlyingphysical database, the operation comprising: providing a databaseabstraction model, wherein the database abstraction model provides (i) aplurality of logical fields that each specify an access method defininga method for accessing data associated with a respective logical field,and (ii) a plurality of model entities, wherein each model entityspecifies a set of logical fields that map to data related to arespective model entity and specifies an identifier in the underlyingdatabase used to identify instances of the respective model entity;receiving an abstract query composed from one or more logical fields ofthe database abstraction model, wherein at least a first logical fieldin the abstract query is specified in a dictionary term criteriacondition, wherein the dictionary term criteria condition includes alist of one or more keywords, wherein the access method specified by thefirst logical field maps the first logical field to a plurality ofdocuments related to a given instance of the model entity, and whereinthe dictionary term criteria condition is evaluated by determiningwhether the plurality of documents related to the given instance of themodel entity includes at least a distinct document containing arespective one of the one or more keywords; generating, from theabstract query, a resolved query of the underlying physical database;and storing the resolved query for execution against the underlyingphysical database.
 9. The computer-readable storage medium of claim 8,wherein the operation further comprises: executing the resolved query toidentify one or more instances of the model entity that satisfy thedictionary term criteria condition; and returning, to a requestingentity, an indication of the identified instances of the model entity.10. The computer-readable storage medium of claim 9, wherein theoperation further comprises, presenting results of the executed query tothe user in a graphical user interface on a display device.
 11. Thecomputer-readable storage medium of claim 8, wherein each of theplurality of documents are stored as unstructured text in a column of atable in the underlying physical database, and wherein generating, fromthe abstract query, a resolved query of the underlying physical databasecomprises: generating, for each of the plurality of terms, an instanceof the table; generating, for each instance of the table, a sub queryconfigured to identify instances of the model entity for which theunstructured text contains a respective one of the keywords; generatingan intersecting query configured to intersect the results of each of thesub queries to identify instances of the model entity which include atleast one distinct document for each keyword.
 12. The computer-readablestorage medium of claim 8, wherein the abstract query further specifiesat least one condition composed from one or more logical fields of thedata abstraction model.
 13. The computer-readable storage medium ofclaim 8, wherein the abstract query further specifies at least oneresult field specifying data stored by the underlying database to returnfor instances of the model entity that satisfy the dictionary termcriteria condition.
 14. The computer-readable storage medium of claim 8,wherein the resolved query is composed using the SQL query language. 15.A system, comprising: a processor; and a memory storing an application,which, when executed by the processor is configured to perform anoperation for processing an abstract query of an underlying physicaldatabase, the operation comprising: providing a database abstractionmodel, wherein the database abstraction model provides (i) a pluralityof logical fields that each specify an access method defining a methodfor accessing data associated with a respective logical field, and (ii)a plurality of model entities, wherein each model entity specifies a setof logical fields that map to data related to a respective model entityand specifies an identifier in the underlying database used to identifyinstances of the respective model entity, receiving an abstract querycomposed from one or more logical fields of the database abstractionmodel, wherein at least a first logical field in the abstract query isspecified in a dictionary term criteria condition, wherein thedictionary term criteria condition includes a list of one or morekeywords, wherein the access method specified by the first logical fieldmaps the first logical field to a plurality of documents related to agiven instance of the model entity, and wherein the dictionary termcriteria condition is evaluated by determining whether the plurality ofdocuments related to the given instance of the model entity includes atleast a distinct document containing a respective one of the one or morekeywords, generating, from the abstract query, a resolved query of theunderlying physical database, and storing the resolved query forexecution against the underlying physical database.
 16. The system ofclaim 15, wherein the operation further comprises: executing theresolved query to identify one or more instances of the model entitythat satisfy the dictionary term criteria condition; and returning, to arequesting entity, an indication of the identified instances of themodel entity.
 17. The system of claim 16, wherein the operation furthercomprises, presenting results of the executed query to the user in agraphical user interface on a display device.
 18. The system of claim15, wherein each of the plurality of documents are stored asunstructured text in a column of a table in the underlying physicaldatabase, and wherein generating, from the abstract query, a resolvedquery of the underlying physical database comprises: generating, foreach of the plurality of terms, an instance of the table; generating,for each instance of the table, a sub query configured to identifyinstances of the model entity for which the unstructured text contains arespective one of the keywords; generating an intersecting queryconfigured to intersect the results of each of the sub queries toidentify instances of the model entity which include at least onedistinct document for each keyword.
 19. The system of claim 15, whereinthe abstract query further specifies at least one condition composedfrom one or more logical fields of the data abstraction model.
 20. Thesystem of claim 15, wherein the abstract query further specifies atleast one result field specifying data stored by the underlying databaseto return for instances of the model entity that satisfy the dictionaryterm criteria condition.
 21. The system of claim 15, wherein theresolved query is composed using the SQL query language.